144 research outputs found
Expression Syntax Information Bottleneck for Math Word Problems
Math Word Problems (MWP) aims to automatically solve mathematical questions
given in texts. Previous studies tend to design complex models to capture
additional information in the original text so as to enable the model to gain
more comprehensive features. In this paper, we turn our attention in the
opposite direction, and work on how to discard redundant features containing
spurious correlations for MWP. To this end, we design an Expression Syntax
Information Bottleneck method for MWP (called ESIB) based on variational
information bottleneck, which extracts essential features of expression syntax
tree while filtering latent-specific redundancy containing syntax-irrelevant
features. The key idea of ESIB is to encourage multiple models to predict the
same expression syntax tree for different problem representations of the same
problem by mutual learning so as to capture consistent information of
expression syntax tree and discard latent-specific redundancy. To improve the
generalization ability of the model and generate more diverse expressions, we
design a self-distillation loss to encourage the model to rely more on the
expression syntax information in the latent space. Experimental results on two
large-scale benchmarks show that our model not only achieves state-of-the-art
results but also generates more diverse solutions. The code is available.Comment: This paper has been accepted by SIGIR 2022. The code can be found at
https://github.com/menik1126/math_ESI
More than Encoder: Introducing Transformer Decoder to Upsample
Medical image segmentation methods downsample images for feature extraction
and then upsample them to restore resolution for pixel-level predictions. In
such a schema, upsample technique is vital in restoring information for better
performance. However, existing upsample techniques leverage little information
from downsampling paths. The local and detailed feature from the shallower
layer such as boundary and tissue texture is particularly more important in
medical segmentation compared with natural image segmentation. To this end, we
propose a novel upsample approach for medical image segmentation, Window
Attention Upsample (WAU), which upsamples features conditioned on local and
detailed features from downsampling path in local windows by introducing
attention decoders of Transformer. WAU could serve as a general upsample method
and be incorporated into any segmentation model that possesses lateral
connections. We first propose the Attention Upsample which consists of
Attention Decoder (AD) and bilinear upsample. AD leverages pixel-level
attention to model long-range dependency and global information for a better
upsample. Bilinear upsample is introduced as the residual connection to
complement the upsampled features. Moreover, considering the extensive memory
and computation cost of pixel-level attention, we further design a window
attention scheme to restrict attention computation in local windows instead of
the global range. We evaluate our method (WAU) on classic U-Net structure with
lateral connections and achieve state-of-the-art performance on Synapse
multi-organ segmentation, Medical Segmentation Decathlon (MSD) Brain, and
Automatic Cardiac Diagnosis Challenge (ACDC) datasets. We also validate the
effectiveness of our method on multiple classic architectures and achieve
consistent improvement.Comment: Accepted by BIBM202
Self-consistent Reasoning For Solving Math Word Problems
Math word problems (MWPs) is a task that automatically derives solution
expression from a giving math problems in text. The previous studies suffer
from spurious correlations between input text and output expression. To
mitigate this issue, we propose a self-consistent reasoning framework called
SCR, which attempts to adopt a pruning strategy to correct the output
distribution shift so as to implicitly fix those spurious correlative samples.
Specifically, we firstly obtain a sub-network by pruning a roberta2tree model,
for the sake to use the gap on output distribution between the original
roberta2tree model and the pruned sub-network to expose spurious correlative
samples. Then, we calibrate the output distribution shift by applying symmetric
Kullback-Leibler divergence to alleviate spurious correlations. In addition,
SCR generates equivalent expressions, thereby, capturing the original text's
logic rather than relying on hints from original text. Extensive experiments on
two large-scale benchmarks demonstrate that our model substantially outperforms
the strong baseline methods.Comment: Submitted to IEEE ICASSP 202
Self-Supervised Gait Encoding with Locality-Aware Attention for Person Re-Identification
Gait-based person re-identification (Re-ID) is valuable for safety-critical
applications, and using only 3D skeleton data to extract discriminative gait
features for person Re-ID is an emerging open topic. Existing methods either
adopt hand-crafted features or learn gait features by traditional supervised
learning paradigms. Unlike previous methods, we for the first time propose a
generic gait encoding approach that can utilize unlabeled skeleton data to
learn gait representations in a self-supervised manner. Specifically, we first
propose to introduce self-supervision by learning to reconstruct input skeleton
sequences in reverse order, which facilitates learning richer high-level
semantics and better gait representations. Second, inspired by the fact that
motion's continuity endows temporally adjacent skeletons with higher
correlations ("locality"), we propose a locality-aware attention mechanism that
encourages learning larger attention weights for temporally adjacent skeletons
when reconstructing current skeleton, so as to learn locality when encoding
gait. Finally, we propose Attention-based Gait Encodings (AGEs), which are
built using context vectors learned by locality-aware attention, as final gait
representations. AGEs are directly utilized to realize effective person Re-ID.
Our approach typically improves existing skeleton-based methods by 10-20%
Rank-1 accuracy, and it achieves comparable or even superior performance to
multi-modal methods with extra RGB or depth information. Our codes are
available at https://github.com/Kali-Hac/SGE-LA.Comment: Accepted at IJCAI 2020 Main Track. Sole copyright holder is IJCAI.
Codes are available at https://github.com/Kali-Hac/SGE-L
Towards Intelligent Decision Making in Emotion-aware Applications
In this paper, we propose an intelligent emotion-aware system (IES), which aims to provide a systematic approach that can make use of the online technology to improve the intelligence of different emotion-aware mobile applications. IES is constructed to provide multi-dimensional online social community data collection and processing approaches for decision making, so as to recommend intelligent services for emotion-aware mobile applications. Furthermore, we present a flow of intelligent decision making process designed on IES, and highlight the implementation and orchestration of several key technologies and schemes applied in this system for different emotion-aware mobile applications in run-time. We demonstrate the feasibility of the proposed IES by presenting a novel emotion-aware mobile application - iSmile, and evaluate the system performance based on this application
Forgetting before Learning: Utilizing Parametric Arithmetic for Knowledge Updating in Large Language Models
Recently Large Language Models (LLMs) have demonstrated their amazing text
understanding and generation capabilities. However, even stronger LLMs may
still learn incorrect knowledge from the training corpus, as well as some
knowledge that is outdated over time. Direct secondary fine-tuning with data
containing new knowledge may be ineffective in updating knowledge due to the
conflict between old and new knowledge. In this paper, we propose a new
paradigm for fine-tuning called F-Learning (Forgetting before Learning), which
is based on parametric arithmetic to achieve forgetting of old knowledge and
learning of new knowledge. Experimental results on two publicly available
datasets demonstrate that our proposed F-Learning can obviously improve the
knowledge updating performance of both full fine-tuning and LoRA fine-tuning.
Moreover, we have also discovered that forgetting old knowledge by subtracting
the parameters of LoRA can achieve a similar effect to subtracting the
parameters of full fine-tuning, and sometimes even surpass it significantly.Comment: 8 pages, 2 figures, 2 table
- …